AITopics | rare class

SIMILAR: SubmodularInformationMeasuresBased ActiveLearningInRealisticScenarios

Neural Information Processing SystemsFeb-10-2026, 04:27:38 GMT

Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples.

artificial intelligence, learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Exemplar Guided Active Learning

Neural Information Processing SystemsDec-24-2025, 08:36:13 GMT

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew. We describe an active learning approach that (1) explicitly searches for rare classes by leveraging the contextual embedding spaces provided by modern language models, and (2) incorporates a stopping rule that ignores classes once we prove that they occur below our target threshold with high probability. We prove that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can significantly reduce the number of samples needed to reach target accuracy levels.

exemplar guided active learning, knowledge base, name change, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Material for SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios Table of Contents

Neural Information Processing SystemsNov-15-2025, 06:33:34 GMT

Our experiments utilize contributions from existing code repositories.

complexity, dataset, experiment, (13 more...)

Neural Information Processing Systems

Genre: Collection (0.40)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exemplar Guided Active Learning

Neural Information Processing SystemsAug-22-2025, 00:29:30 GMT

However, this label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that do not occur in the corpus because the sense is rare in modern English; conversely, there may also exist true labels that do not exist in our knowledge base. For example, consider the word "bass."

dataset, learning, rare class, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

9af08cda54faea9adf40a201794183cf-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 08:25:17 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exemplar Guided Active Learning

Neural Information Processing SystemsMay-27-2025, 06:41:44 GMT

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew.

exemplar guided active learning, machine learning, natural language, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

Ahmed, Taufiq, Kumar, Abhishek, Casado, Constantino Álvarez, Zhang, Anlan, Hänninen, Tuomo, Loven, Lauri, López, Miguel Bordallo, Tarkoma, Sasu

arXiv.org Artificial IntelligenceMar-27-2025

Object detection models often struggle with class imbalance, where rare categories appear significantly less frequently than common ones. Existing sampling-based rebalancing strategies, such as Repeat Factor Sampling (RFS) and Instance-Aware Repeat Factor Sampling (IRFS), mitigate this issue by adjusting sample frequencies based on image and instance counts. However, these methods are based on linear adjustments, which limit their effectiveness in long-tailed distributions. This work introduces Exponentially Weighted Instance-Aware Repeat Factor Sampling (E-IRFS), an extension of IRFS that applies exponential scaling to better differentiate between rare and frequent classes. E-IRFS adjusts sampling probabilities using an exponential function applied to the geometric mean of image and instance frequencies, ensuring a more adaptive rebalancing strategy. We evaluate E-IRFS on a dataset derived from the Fireman-UAV-RGBT Dataset and four additional public datasets, using YOLOv11 object detection models to identify fire, smoke, people and lakes in emergency scenarios. The results show that E-IRFS improves detection performance by 22\% over the baseline and outperforms RFS and IRFS, particularly for rare categories. The analysis also highlights that E-IRFS has a stronger effect on lightweight models with limited capacity, as these models rely more on data sampling strategies to address class imbalance. The findings demonstrate that E-IRFS improves rare object detection in resource-constrained environments, making it a suitable solution for real-time applications such as UAV-based emergency monitoring.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.21893

Country:

North America > United States > California (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Robotics & Automation (0.40)
Aerospace & Defense > Aircraft (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.40)

Add feedback

Review for NeurIPS paper: Exemplar Guided Active Learning

Neural Information Processing SystemsJan-26-2025, 21:02:36 GMT

Why is the sampling strategy switched to uncertainty sampling once an example is collected? Is it because after the classifier seeing one example in a rare class, it could start to give high uncertainty to the rare class? If that is the case, I do not understand why we cannot use the initial example (in WordNet?), which we assume to be available at the beginning, to train the classifier and directly use uncertainty sampling at the beginning. Minor questions: 1. Did you try to use cosine distance rather than L2 distance in the guided search? It might improve the performance a little?

exemplar guided active learning, neurips paper, rebuttal, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models

You, Weiqiu, Park, Youngja

arXiv.org Artificial IntelligenceNov-27-2024

Understanding the attack patterns associated with a cyberattack is crucial for comprehending the attacker's behaviors and implementing the right mitigation measures. However, majority of the information regarding new attacks is typically presented in unstructured text, posing significant challenges for security analysts in collecting necessary information. In this paper, we present a sentence classification system that can identify the attack techniques described in natural language sentences from cyber threat intelligence (CTI) reports. We propose a new method for utilizing auxiliary data with the same labels to improve classification for the low-resource cyberattack classification task. The system first trains the model using the augmented training data and then trains more using only the primary data. We validate our model using the TRAM data1 and the MITRE ATT&CK framework. Experiments show that our method enhances Macro-F1 by 5 to 9 percentage points and keeps Micro-F1 scores competitive when compared to the baseline performance on the TRAM dataset.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.18755

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

Exemplar Guided Active Learning

Neural Information Processing SystemsOct-10-2024, 21:10:47 GMT

We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. For example, consider the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew.

exemplar guided active learning, knowledge base, rare class, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

rare class

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SIMILAR: SubmodularInformationMeasuresBased ActiveLearningInRealisticScenarios

Exemplar Guided Active Learning

Supplementary Material for SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios Table of Contents

Exemplar Guided Active Learning

9af08cda54faea9adf40a201794183cf-Paper.pdf

Exemplar Guided Active Learning

Exponentially Weighted Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection Model Training in Unmanned Aerial Vehicles Surveillance Scenarios

Review for NeurIPS paper: Exemplar Guided Active Learning

Cyber-Attack Technique Classification Using Two-Stage Trained Large Language Models

Exemplar Guided Active Learning